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OBJECTIVE PRIORS FOR THE BIVARIATE NORMAL MODEL 

By James O. Berger 1 and Dongchu Sun 2 

Duke University and University of Missouri-Columbia 

Study of the bivariate normal distribution raises the full range of 
issues involving objective Bayesian inference, including the different 
types of objective priors (e.g., Jeffreys, invariant, reference, match- 
ing), the different modes of inference (e.g., Bayesian, frequentist, fidu- 
cial) and the criteria involved in deciding on optimal objective priors 
(e.g., ease of computation, frequentist performance, marginalization 
paradoxes). Summary recommendations as to optimal objective pri- 
ors are made for a variety of inferences involving the bivariate normal 
distribution. 

In the course of the investigation, a variety of surprising results 
were found, including the availability of objective priors that yield 
exact frequentist inferences for many functions of the bivariate normal 
parameters, including the correlation coefficient. 

1. Introduction and prior distributions. 

1.1. Notation and problem statement. The bivariate normal distribution 
of (xi,x 2 )' has mean parameters /x = (/ii,/^)' and covariance matrix 

s = ( oi ,00102 
V ,9(71(72 of 

where p is the correlation between x\ and x 2 . The density is 
1 
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of (xi - /ii) 2 + a\ (x 2 - fi 2 ) 2 - 2paio- 2 (x 1 - fii)(x 2 - pL 2 ) 
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The data consists of an independent random sample X = (x& = (xik,X2k),k ■ 
1, . . . , n) of size n > 3, for which the sufficient statistics are 



(l)x= !M and S = V(x fe -x)(x fc -x)'=( ^11— r V^\ 
where, for i,j = 1,2, 



X{ — ft ^ y Xij, S{j ^ (xjk Xi^)(xjk Xj*j and v — — I — 

j=l k=l V S H S 22 



We will denote prior densities as vr(/ii, /i2, o"i 02, p), and the corres- 
ponding posterior densities as ir(fii, fi2,&i &2, P I X) (all with respect to 
d(i2 da i da2 dp) . 

We consider objective inference for parameters of the bivariate normal 
distribution and functions of these parameters, with special focus on devel- 
opment of objective confidence or credible sets. Section 1.2 introduces many 
of the key issues to be covered, through a summary of some of the most in- 
teresting results involving priors yielding exact frequentist procedures; this 
section also raises interesting historical and philosophical issues. For easy ac- 
cess, Section 1.3 presents our summary recommendations as to which priors 
to utilize. 

Often, the posteriors for the recommended priors are essentially avail- 
able in computational closed form, allowing direct Monte Carlo simulation. 
Section 2 provides simple accept-reject schemes for computing with the rec- 
ommended priors in other cases. Sections 3 and 4 develop the needed theory, 
concerning what are called reference priors and matching priors, respectively, 
and also present various simulations that were conducted to enable summary 
recommendations to be made. 

Notation: In addition to (p\, p2, 0*1, 0*2, p), the following parameters will 
be considered: 

1 1 P 

(2) vi = —, V2 = — 7= — g , m = H 2 , 

#2 = a 2 2 (l -A 3 = \-E\=alaj(l-p 2 ), 



(3) 



0"! 



a 2 \Jl- p 2 



— , tfQ — a 1 a 2 , (77 — — , ^8 — — : 
a\ a\ a2 



(4) 

(5) Oio = a( + CJ2 -2paia 2 , 



79 = C12 = P0"i0"2, 
J2 , 2 
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(6) #11 = d'X!d [d' = (di,d,2) not proportional to (0, 1)], 

(7) Ai = c/i max (S), A 2 = ch min (E). 

Some of these parameters have straightforward statistical interpretations. 
Since (x2 \ x±,fj,, X) ~ N(p2 + 0i(x\ — pi),02), it is clear that 6\ is a re- 
gression coefficient, 62 is a conditional variance, and 77! is the corresponding 
precision. For the marginal distribution of x\, n\ is the precision and #5 is 
the reciprocal of the coefficient of variation. #3 is usually called the gener- 
alized variance. (771,772;%) gives a type of Cholesky decomposition of the 
precision matrix S _1 [see (13) in Section 2.1]. 6\q is the variance of x\ —X2, 
and #11 is the variance of d\X\ + d2X2- Finally, Ai and A2 are the largest and 
smallest eigenvalues of S. 

Technical issue. We will assume that \p\ < 1 and \r \ < 1 in virtually all 
expressions and results that follow. This is because, if either equals 1 in 
absolute value, then p = {sign of r} with probability 1 (either frequentist 
or Bayesian posterior, as relevant). Indeed, the situation then essentially 
collapses to the univariate version of the problem, which is standard. 

1.2. Matching, constructive posteriors and fiducial distributions. The bi- 
variate normal distribution has been extensively studied from frequentist, 
fiducial and objective Bayesian perspectives. Table 1 summarizes a number 
of interesting results. 

• For a variety of parameters, it presents objective priors (discussed below) 
for which the resulting Bayesian posterior credible sets of level 1 — a are 
also exact frequentist confidence sets at the same level; in this case, the 
priors are said to be exact frequentist matching. This is a very desirable 
situation: see [23] and [2] for general discussion and the many earlier 
references. 

• For pi, P2, o~ 1, o"2 and p, the constructive posterior distributions are also 
the fiducial distributions for the parameters, as found in Fisher [14, 15] 
and [21]. 

• Posterior distributions are presented as constructive random distributions, 
that is, by a description of how to simulate from them. Thus to simulate 
from the posterior distribution of o~\ , given the data (actually, only su is 
needed), one draws independent Xn-i ran dom variables and simply com- 
putes the corresponding J sn/Xn-i! this yields an independent sample 
from the fiducial/posterior distribution of a\. 

Table 1 also lists the objective prior distributions that yield the indicated 
objective posterior. The notation 7r a h in the table stands for the important 
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Table 1 

Parameters with exact matching priors of the form ix a b, o,nd associated constructive 
posteriors: Here Z* is a standard normal random variable, and Xn-i an d Xn-2 are 
chi-squared random variables with the indicated degrees of freedom, all random variables 
being independent. For /ii , , <xi , 02 and p, the indicated posteriors are also fiducial 

distributions 



Parameter 



Prior 



Posterior 



/'1 

fl2 



d'( pi ), de 



0~1 



If, 



p(T2 



9 2 = a 2 2 (l-p 2 ) 



/l-p2 



d'Ed 



7rib,Vfo (including tyj and tvh) 

TTJ — 7T10 

7v j = 7Tio and tih* (see Table 4) 
7Tii,,V6 (including ttj and nn) 

TVH = 7T12 

7T a 2,Va (including 7Tfr) 
7Ta2,Va (including 7nj) 
7r a 2,Va (including 7rw) 

7Ttf = 7T12 and 7T/J = 7T21 
TTff = 7T12 

7Tii,,V6 (including 7rj and 7rjf) 
7rj = 7Tio and 7r_r/» (see Table 4) 



xi + 



X'2 + 



d ' ( ^ 2),+ 7fev^ 



0( 



V>(y) = y/V 1 + y 2 



z * 



v 1_r 

'l-7' 2 ,/s5 



«22(1"'- 2 ) 

Ig] 



\An-l ,/ S22 (l-r 2 ) 

v 7 ^ 



d'Sd 
~~ 2"* 



class of prior densities (a subclass of the generalized Wishart distributions 
of [8]) 

(8) 7r a6 (/i 1 ,M 2 ,a 1 ,a 2 ,p) = -p-p^— 

Special cases of this class are the Jeffreys-rule prior 7rj = 7rio, the right-Haar 
prior = 7Ti2, the independence Jeffreys prior 717 j = 7T2i = erf cr<^ (1 — 
p2 ^ -3/2 an( j w j 1 j c j 1 h as a = i) — xhe independence Jeffreys prior fol- 
lows from using a constant prior for the means, and then the Jeffreys prior 
for the covariance matrix with means given. 

We highlight the results about p in Table 1 because they are interest- 
ing from practical, historical and philosphical perspectives. First, it does 
not seem to be known that the indicated prior for p is exact frequentist 
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matching (proved here in Theorem 2). Indeed, standard statistical software 
utilizes various approximations to arrive at frequentist confidence sets for p, 
missing the fact that a simple exact confidence set exists, even for n = 3. It 
was, of course, known that exact frequentist confidence procedures could be 
constructed (cf. Exercise 54, Chapter 6 of [18]), but explicit expressions do 
not seem to be available. 

The historically interesting aspect of this posterior for p is that it is also 
the fiducial distribution of p. Geisser and Cornfield [16] studied the question 
of whether the fiducial distribution of p could be reproduced as an objective 
Bayesian posterior, and they concluded that this was most likely not pos- 
sible. The strongest evidence for this arose from Brillinger [7], which used 
results from [19] and a difficult analytic argument to show that there does 
not exist a prior n(p) such that the fiducial density of p equals f(r \ p)tt(p), 
where f(r\p) is the density of r given p. Since the fiducial distribution of p 
only depends on r, it was certainly reasonable to speculate that if it were not 
possible to derive this distribution from the density of r and a prior, then it 
would not be possible to do so in general. The above result, of course, shows 
that this speculation was incorrect. 

The philosophically interesting aspect of this situation is that Brillinger's 
result does show that the fiducial/posterior distribution for p provides an- 
other example of the marginalization paradox ([13]). This leads to an inter- 
esting philosophical conundrum of a type that we have not previously seen: a 
complete fiducial/objective Bayesian/frequentist unification can be obtained 
for inference about p, but only if violation of the marginalization paradox 
is accepted. We will shortly introduce a prior distribution that avoids the 
marginalization paradox for p, but which is not exactly frequentist match- 
ing. We know of no way to adjudicate between the competing goals of exact 
frequentist matching and avoidance of the marginalization paradox, and so 
will simply present both as possible objective Bayesian approaches. (Note 
that the same conundrum also arises for 9$ = p\jo\\ the exact frequentist 
matching prior results in a marginalization paradox, as shown in [24].) Some 
interesting examples of improper priors resulting in marginalization paradox 
can be found from Ghosh and Yang [17] and Datta and Ghosh [10, 11]. 

1.3. Recommended prior 's. It is actually rare to have exact matching pri- 
ors for parameters of interest. Also, one is often interested in very complex 
functions of parameters (e.g., predictive distributions) and/or joint distri- 
butions of parameters. For such problems it is important to have a general 
objective prior that seems to perform reasonably well for all quantities of 
interest. Furthermore, it is unappealing to many Bayesians to change the 
prior according to which parameter is declared to be of interest, and an 
objective prior that performs well overall is often sought. 
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Table 2 

Recommendations of objective priors for various parameters in the bivariate normal 
model: D indicates that the posterior will not be exact frequentist matching. (For (12 and 
parameters with o\ replaced by 02, use the right-Haar prior with the variances 

interchanged.) 



Prior 



Parameter 



71" // 

tth (see Table 4) 

7TRA 
TTRct 



s 



general use 



d'dti.Ata)', d'Ed 



c/lmax(S) 



<T12 = p0-\O-2 



The five priors we recommend for various purposes are irj, tth, 



(9J ir Rp oc - - ^ vr fiCT oc 



0"lO"2(l-p 2 ) cricr 2 (l-p 
and 

. 1 

(10) vt^a oc g. 

The first prior in (9) was developed in [20] and was studied extensively in 
[1], where it was shown to be a one-at-a-time reference prior (see Section 3). 
The second prior in (9) is new and is derived in Section 3. ttr\ was developed 
as a one-at-a-time reference prior in [25]. 

With these definitions, we can make our summary recommendations. Ta- 
ble 2 gives the four objective priors that are recommended for use, and in- 
dicates for which parameters (or functions thereof) they are recommended. 
These recommendations are based on three criteria: (i) the degree of frequen- 
tist matching, discussed in Section 4; (ii) being a one-at-a-time reference 
prior, discussed in Section 3; and (iii) ease of computation. The rationale for 
each of the entries in the table, based on these criteria, is given in Section 
4.5. 

Another commonly used prior is the "scale prior," 7rs oc (dio^) -1 • The 
motivation that is often given for this prior is that it is "standard" to use a~ l 
as the prior for a standard deviation <7j , while — 1 < p < 1 is on a bounded 
set and so one can use a constant prior in p. We do not recommend this 
prior, but do consider its performance in Section 4.5. 

2. Computation. In this paper, a constant prior is always used for (/xi ,1^2), 
so that 



(11) 



/'1 

/'•2 



S,X] ~AT 2 ( ( ^ ] ,n-^ 
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Generation from this conditional posterior distribution is standard, so the 
challenge of simulation from the posterior distribution requires only sam- 
pling from (ai,a 2 , p | X). 

The marginal likelihood of {a\ ,o~2,p) satisfies 



It is immediate that, under the priors ir j and ttjj, the marginal posteriors 
of S are Inverse Wishart (S _1 ,re) and Inverse Wishart (S _1 ,n— 1), respec- 
tively. 

Berger, Strawderman and Tang [4] gave a Metropolis-Hastings algorithm 
to generate from (a\ , 02 , p | X) based on the prior itji\ . The following sections 
deal with the other priors we consider. 

2.1. Marginal posteriors of {a\ , 02 , p) under ttr p , ttrxt, vtro-, and its- For 
these priors, an independent sample from ir(ai,a2,p | X) can be obtained 
by the following acceptance-rejection algorithm: 

Simulation step. Generate (o-i,o~2,p) from the independence Jeffreys poste- 
rior 7Tfj(<7i,cr2, p I X) [the Inverse Wishart (S _1 ,n — 1) distribution] and, 
independently, sample u ~ Uniform(0, 1). 

Rejection step. Suppose M = sup (CT1[tT2[P) jfy^lfi) < 00 • If u - n (. a U^2, P)/ 
[M"Ku(ai,a2,p)], accept (c"i,o"2,p); else, return to Simulation step. 

For each of the priors listed in Table 3, the key ratio, tt/ttij, is listed in the 
table, along with the upper bound M, the Rejection step and the resulting 
acceptance probability for p = 0.80, 0.95, 0.99. The rejection algorithm is 
quite efficient for sampling these posteriors. Indeed, for p w 0, the algorithms 
accept with probability near one and, even for large \p\, the acceptance 
probabilities are very reasonable for the priors 7r/j p , iTRa, and 7f_R CT . For large 
\p\, the algorithm is less efficient for the posteriors under the prior tts, but 
even these acceptance rates may well be fine in practice, given the simplicity 
of the algorithm. 

2.2. Computation under 7r a b. The most interesting prior of this form 
(besides the Jeffreys and independence Jeffreys priors) is the right-Haar 
prior tth, although other priors such as tth arise as reference priors, and 
hence are potentially of interest. While Table 1 gave an explicit form for the 
most important marginal posteriors arising from priors of this form, it is 
of considerable interest that essentially closed form generation from the full 
posterior of any prior of this form is possible (see, e.g., [8]). This is briefly 
reviewed in this section, since the expressions for the resulting constructive 
posteriors are needed for later results on frequentist coverage. 



(12) 
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7TRct 



Table 3 

Ratio tt/itij, upper bound M, rejection step and acceptance probability for 
p — 0.80, 0.95, 0.99, when n = n r p , HR a , nacr, and -km s 

Bound Acceptance probability 



Prior Ratio M Rejection Step p = 0.80 p = 0.95 p = 0.99 



y/l-p 2 1 u< yjl -p 2 0.6000 0.3122 0.1410 

yjl^p 1 1 u< yjl -p 4 0.7684 0.4307 0.1985 

**» 75 u ^\l 2 -^p- °- 7276 °- 4215 °- 1975 

7T S (l-p 2 ) 3/2 1 u<(l-p 2 ) 3/2 0.2160 0.0304 0.0028 



It is most convenient to work with the parameters (^1,%)%) given in 
(2). This parameterization gives a type of Cholesky decomposition of the 
precision matrix E" 1 , 

(13) S- 

which accounts for the simplicity of ensuing computations. Note that (2) is 
equivalent to 



( m 




( m 


°) 


U 









(14) 0-1 = —, a 2 = - , p = . 

m mm ^ + ~2 

The prior 7r a fc of (8) for (pi, p, 2 , o~i, a 2 , p) transforms to the extended con- 
jugate class of priors for (p 1 , p 2 , Vu V2, %), given by TT a b(pi, P2, m, V2> %) = 

—a —b 

m % • 

Lemma 1. Consider the prior Tx a ^. 

(a) The marginal posterior of r\^ given {r\\ , r\ 2 ; X) is N{—rfir^/s^fsxi, 1/sn). 

(b) TTie marginal posterior distributions ofrji andr] 2 are independent and 

{r]\ I X) ~ Gamma(i(n — a), ^sn); 

(77I I X) ~ Gamma(i(n - b), \s 22 {l - r 2 )). 

See [5] for a proof of this result. We next present the constructive pos- 
teriors of (t/i,?72i%)) an d from these derive the constructive posteriors of 
(pi, P2, o~\, a 2 , p) and other parameters. All results follow directly from Lemma 
1 and (14). 

In presenting the constructive posteriors, we will use a star to represent 
a random draw from the implied distribution; thus pi will represent a ran- 
dom draw from its posterior distribution, Z*,Z 2 ,Z$ will be independent 
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draws from the standard normal distribution, and x n *-a an d X n *-b wm be in- 
dependent draws from chi-squared distributions with the indicated degrees 
of freedom. Recall that these constructive posteriors are not only useful for 
simulation, but will be the key to proving exact frequentist matching results. 

Fact 1. (a) The constructive posterior of (rji, 772,773) given X can be 
expressed as 



(15) 



' ii V h 2^ ; 

sn V s 22(l - r z ) 



V3 



Zl yjxlU r 
si! \/sy[ Vl - r 2 



(b) The constructive posterior of (cri,o~2,p) given X can be expressed as 
(16) 



I 511 

2* ' 
An— a 



(17) a* 2 = ^s 22 {l-r 2 ) t 



1 1 

+ 



Z* 



(18) p*=ip(Y*), 



Y* 



z% 



+ 



An -6 r 



2* 12* Vl -r 2 ' 

An— a V An— a 



where ip(x) = xj\J\ + x 2 . 

(c) T/ie constructive posterior for p,\ and \i 2 can be written 



(19) l4=X! + 



2* 

An— a 



(20) »* 2 = x 2 + %_ r ^ + 



v 2* x /n 

An— a 



z*> 



z\ 



v 2 * /v 2 * /v 2 * 

An— b V An— 6 V An— a 



s 22 (l-r 5 



n 



3. Reference priors. This paper began with an effort to derive and cata- 
logue the possible reference priors for the bivariate normal distribution. The 
reference prior theory (cf. Bernardo [6] and Berger and Bernardo [3]) has 
arguably been the most successful technique for deriving objective priors. 
Reference priors depend on (i) specification of a parameter of interest; (ii) 
specification of nuisance parameters; (iii) specification of a grouping of pa- 
rameters; and (iv) ordering of the groupings. These are all conveyed by the 
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Table 4 

Reference priors for the bivariate normal model (where pi = d'(pi , (12)' , (5"i) 2 = Or, 
p = d'E(0, iy/(aiy/9r), §2 — cr|[l - (p) 2 ] and Si = pcr 2 /ai); {{ }} indicates that any 

ordering of the parameters yields the same reference prior 

Prior tt([i 1 , fi 2 , a ± ,a 2 , p) For parameter ordering Has form (8) with 

ttj oc CT 2 CT a ( i_ p2) 2 {(Mi,M2,cn,o-2,p)} (a, 6) = (1,0) 

7!7jOC ^^(1-^2)3/2 {(Mi,M2), (ai,<r 2 ,p)} (o, ft) = (2,1) 

71-flp OC ^7J7-^] {p,cri,o- 2 }, {0 7 ,G 6 ,p} 

^ ^ CT 1^(1-P a ) {0-1,0-2, p) 

^ ct0C „ \, P, 2 {o-l,p,0- 2 } 

{o-i,^,^} 

kro oc ^ CT2(1 l p 2 )3 /'2 {o-i, 6*2,773} (a, 6) = (1,1) 
- m „, [((^i/^ 2 )-(^2/ti)) 2 +4p 2 ]- 1/2 n, x , n 

^ (i 1 _ p2) {{ffi.Oi.fla}}, {{01,03,04}} (0,6) = (1,2) 
- H K d^d^y {{d'(p l! p 2 )',p 2 ,e 11 ,^2,e 1 }} 



shorthand notation used in Table 4. Thus, {(/Ji, /X2), (o"i, 02, p)} indicates 
that (/j,i,/j,2) is the parameter of interest, with the others being nuisance 
parameters, and there are two groupings with the indicated ordering. (The 
resulting reference prior is the independence Jeffreys prior, 717,7.) As another 
example, {Ai, A2, m, /J2} introduces the eigenvalues Ai > A2 of £ as being 
primarily of interest, with ■& (the angle defining the orthogonal matrix that 
diagonalizes X), and \ii being the nuisance parameters. 

Based on experience with numerous examples, the reference priors that 
are typically judged to be best are one-at-a-time reference priors, in which 
each parameter is listed separately as its own group. Hence we will focus on 
these priors. It turns out to be the case that, for the one-at-a-time reference 
priors, the ordering of fi\ and jjt-2 among the variables is irrelevant. Hence if 
/Ji and H2 are omitted from a listing in Table 4, the resulting reference prior is 
to be viewed as any one-at-a-time reference prior with the indicated ordering 
of other variables, with the \ii being inserted anywhere in the ordering. 

We are interested in finding one-at-a-time reference priors for the pa- 
rameters /ii, /J2, ci, o"2, p, 773, G\,...,6g and Ai. This is done in [5], with the 
results summarized in Table 4, for all these parameters (i.e., the parameter 
appears as the first entry in the parameter ordering) except 773, 012, and 
fii/ai; finding one-at-a-time reference priors for these parameters is techni- 
cally challenging. (We do not explicitly list the reference priors for 02 in the 
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table, since they can be found by simply switching with a\ in the various 
expressions.) 

4. Comparisons of priors via frequentist matching. 

4.1. Frequentist coverage probabilities and exact matching. Suppose a 
posterior distribution is used to create one-sided credible intervals 
(#£,, 0i_q,(X)), where 9l is the lower limit in the relevant parameter space 
and #i_ Q (X) is the posterior quantile of the parameter 6 of interest, defined 
by P{9 < 9i- a (X.) | X) = 1 — a. (Here 6 is the random variable.) Of interest 
is the frequentist coverage of the corresponding confidence interval, that is, 
C(fj,i,/J,2,ax,a2,p) = P{6 < 0i_ a (X) | m,/J,2,cri,o-2,p)- (Here X is the ran- 
dom variable.) The closer C(/xi, \x-i-, a2, p) is to the nominal 1 — a, the 
better the procedure (and corresponding objective prior) is judged to be. 

The main results about exact matching are given in Theorems 1 through 
8. The proofs of Theorems 1, 2 and 8 are given in Section 5; the rest can be 
found in [5]. 

The following technical lemmas will be repeatedly utilized. The first lemma 
is from (3d. 2. 8) in [22]. Lemma 3 is easy. 

Lemma 2. For n>3 and given ai,o"2,p, the following three random 
variables are independent and have the indicated distributions: 



(21) T 2 



Sll 


1/2 


>V^22 


PO-2 


Ukw 2 ). 






o-i . 



= Z% [standard normal), 



(22) T3 " a 2 2 (l-p2) =*n-2, 

(23) T 5 = ^=xl-i- 

Lemma 3. Let Y"i_ a denote the 1 — a quantile of any random variable 

Y . 

(a) If g(-) is a monotonically increasing function, [g(Y)]i_ a = g(Yi_ a ) 
for any a G (0, 1). 

(b) If W is a positive random variable, (WY)i- a > if and only if 
Yi_ Q > 0. 

We will reserve quantile notation for posterior quantiles, with respect to 
the * distributions. Thus the quantile [(01 Z| — rZ^/Xn-i + P\/ s iiX 2 n-bh-a 
would be computed based on the joint distribution of (Z|, Xn-&)> wnne hold- 
ing (ai,p,r,sn,Z 3 ,xl-i) fixed. 
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4.2. Credible intervals for a class of functions of (01, 02, p)- We consider 
the one-sided credible intervals of 01, 02 and p and some functions of the 
form 

(24) 9 = atafg{pl 

for di,c?2 6l and some function g(-). We also consider a class of scale- 
invariant priors 

Hp) 



(25) 7r(^i,^2,O"l,0 2 , / o) oc 

for some c\,C2 G R and a positive function /i. 



Theorem 1. Denote the 1 — a posterior quantile of 9 by 0i_ a (X) under 
the prior (25). For any fixed (p\, P2, ci, 02, p), the frequentist coverage of 
the credible interval (#l, #i_ a (X)) depends only on p. Here 6l is the lower 
boundary of the parameter space for 6. 

Note that parameters p, 7/1,772,%, 9\, . . . ,#4 are all functions of the form 
(24). From Theorem 1, under any of the priors ttj, 7T/j, 7rR CT , 7Tr p , ttro,^h, tts, 
the frequentist coverage probabilities of credible intervals for any of these 
parameters will depend only on p. We will show that the frequentist coverage 
probabilities could be exact under the prior 7r a fc. Since ??i(??2) is a monotone 
function of 01(6*2), we consider only p and the last 5 parameters. 

4.3. Coverage probabilities under 7r a b. 

Theorem 2. (a) For ij) defined in (18), the posterior 1 — a quantile of p 
is Pi-a = ^(Xi-a)- ( b ) For anya€ (0,1), £ = (/ii, /x 2 , 01, 02) and p £ (-1, 1), 

P{p<p*i- a \£,p) 

(26) 

(c) (26) equals 1 — a if and only if the right Haar prior is used, that is, 
(o,6) = (l,2). 

Theorem 3. (a) For any a G (0, 1), £ = (pi, /i2,0i, 02) and p£ (-1,1), 

, s P(V3<(ril)i-*\t,p) 
(27) 



Xn-2 V \/Xn-6 7 1 "° 

(b) equals 1 — a for any —l<p<l if and only if b = 2. 
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Theorem 4. (a) The constructive posterior of 9\ = pa 2 ja\ has the ex- 
pression 

9 l - 



(b) For any a G (0,1), £ = (p,i, p 2 , cr 2 ) and p G (—1,1), 



(28) P(9 1 < ((9l)!_ a | tp) =p(t n -2 <J^^(t*n-b)l-a 

which does not depend on p. Furthermore, (28) equals 1 — a if and only if 
6 = 2. 

Theorem 5. (a) The constructive posterior of 9 2 = of(l — P 2 ) ^ s #2 = 

S22(l-r 2 )/xiU- 

(b) For any a G (0, 1), £ = (pi, p 2 , o~i, o~ 2 ) and p G (—1, 1), 



(29) P{9 2 < (^)x_ Q | £, P) = P(Xn-2 > (Xn 



'* &)a)j 



which does not depend on p. Furthermore, (29) equals 1 — a if and only if 
6 = 2. 



Theorem 6. (a) The constructive posterior of 9% = |S| is 0| = |S| 

(Xn— ttXn- b) • 

(b) For any £= (pi, p 2 ,ai,cr 2 ) and p G (-1,1), 
(30) P(0 3 < (05)l-« I &P) = ^(x'-lXn-2 > (X**-„Xn- 6 )a), 



which does not depend on p. Furthermore, (30) equals 1 — a iff (a, 6) is (1,2) 
or (2,1). 

Theorem 7. (a) TTte constructive posterior of #4 is 



Xn-a S2 2 (l-r 2 



4 ^X*L 6 Y -11 ' 
(b) For any £= (/i x , /i 2 , 0"i, £J 2 ) and (-1,1), 
(31) P{9 A < {Oftl-* I tp)=P(xl-l/xl- 2 < (Xn-a/Xn-b)l-a), 
which does not depend on p. Furthermore, (31) equals 1 — a iff (a, 6) = (1,2). 



An interesting function of (pi, p 2 ,o~i,a 2 , p) not of the form (24) is #5 = 
Mi/cri. 
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Theorem 8. (a) The constructive posterior of 85 = \i\j(J\ is 

(b) For any a G (0,1), the frequentist coverage of the credible interval 
(-00, (6>5)i_ a ) is 

^(#5 < {6t)l-a I Pi, A*2, 0-1, a 2 , p) 

<32) =P (*-W jzj-h** 9 \ 

which depends on #5 only and equals 1 — a if and only if a = 1 . 

4.4. First order asymptotic matching. Datta and Mukerjee [9] and Datta 
and Ghosh [12] discuss how to determine first-order matching priors for 
functions of parameters; these are priors such that the frequentist coverage 
of a one-sided credible interval is equal to the Bayesian coverage up to a term 
of order n _1 . For each of the nine objective priors ttj, ttu, ttr p , ttrcj, kro, 
krx, ftH, and T^Ra, [5] determines if it is a first-order matching prior for 
each of the parameters p\, p2, 01, 02, p, 0\, ■ ■ ■ , #10 ■ The results are listed 
in Table 5. For example, ttj is a first order matching prior for p±, P2, o~\, 02, 
9\, #5, 87, 9s, and 9\q, but not for 773, 62, #3 and 0%. 

4.5. Numerically computed coverage and recommendations. First-order 
matching is only an asymptotic property, and finite sample performance is 
also crucial. We thus also implemented a modest numerical study, compar- 
ing the numerical values of frequentist coverages of the one-sided credible 
sets P{9 > go.05) and P{9 < (70.95), for the parameters, 9, listed in Table 6 
and for the eight objective priors ^j^u^Rp^Ra, ^ro,^r\,^h and its- As 
usual, q a = Q Q (X) is the posterior a-quantile of 9, and the coverage proba- 
bility is computed based on the sampling distribution of q a (X) for the fixed 
parameter (pi, p2, 01, a%) and p. Many of the coverage probabilities depend 
only on p, which was thus chosen to be the x-axis in the graphs. We consid- 
ered the case n = 3 (the minimal possible sample size and hence the most 
challenging in terms of obtaining good coverage) and the two scenarios Case 
a: (pi,P2,o-i,a 2 ) = (0,0,1,1), and Case b: (pi,p 2 ,o- 1 ,a 2 ) = (0,0,2,1). 

Here we present the numerical results concerning coverage for only two of 
the parameters: p in Figure 1 and 9-j = 02/01 in Figure 2. Table 6 summarizes 
the results from the entire numerical study, the details of which can be found 
in [5]. The recommendations made in Table 2 for the boxed parameters are 
justified from these numerical results as follows. 

The inferences involving the nonboxed parameters in Table 2 are given in 
closed form in Table 1 (and so are computationally simple), and are exact 
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Table 5 

The first-order asymptotic matching of objective priors for pi , p2, 01, 02, p, pi — P2, f]3, 
0j,j = 1, . . . , 10. Here a boldface letter indicates exact matching 

Asymptotic matching 



Prior tt((a 1 , (i 2 , cr 1? <r 2 , p) Yes No 

TJ OC -5-577— Ml,A t 2i°"li°'2 p 



a^fl-p 2 )^ 



T 1( T 2 (l-p 2 )3/ 2 



1 



pl — p2, 01, 05, 07, 08, 010 V3, 02, 03, 09 

P1,P2 01,02, p 

pi — P2, 01, 03, 07 7)3, 02, 05, 08, 09, 010 



01,02 



CT1(T2(1-P 2 )V 2 -P 



Pi — P2, 03, 07 T/3, 9l, 02, 05, 08, 09, 010 

P1,P2 Ol,02,p 



pi — p2, V3, t>3, 07 Ol, 02, #5, t>8, t>9, flO 

TVRO OC CT 2 (1 i 2)3/2 Ml.M2,0-l 02, P 



[<TltT 2 (l 

y((<Ti/ CT2 )-( CT2 / CT1 )) 2 +4p 2 



Ml — P2, 6*1, 05 »?3, 02, #3, 07, 08, 09, 010 

7TRA OC A/ 11 ^ ; x>9 | j 9 Pl,P2 &1,V2,P 

Pl-p2,03 7/3,01,02,05, 

07,08,09,010 

nH X <r 2 (l-p 2 ) Ml,P2,fTl,P 0"2 

Pi — P2, f/ 3 , ^1, 02, 03, 04, 05 07, 08, 09, 010 

TS«^ pi,P2 cr 1 ,a 2 ,p 

Pi — P2, 03, 07 ??3, 01, 02, 05, 08, 09, 010 



1+P 2 

Pi — P2, #3, #7, #9 01, "2, ??3, y 5, #8, #10 



nRa ^ nkil-p 1 ) Ml ' M2 0-1, 02, p 



Table 6 

Performance of objective priors for each of the parameters 



Prior 



Parameter 


Bad 


Medium 


Good 


Pi 




rest 


71 7JO, 77 77, 77 ,7 


Pi - P2 




rest 


71.7 , TTflO 


Ol 


717 J 


rest 


7177 , *K R\ , It MS 


02 


7177, 71 TiO, 717.7 


rest 


77.7 


/' 


7rj,7T/j,7rs,7rBo 




777? p , 777J CT , TYRX , 7177 , 7TM S 


Ai 


rest 


TtJ,llR\,TTRO 




03 = |S| 


7I"HO,1"J 


rest 


777,7, 7177 


07= ^ 


^H,^J,TTRO,^R\ 


rest 




09 =012 


i"./,7r/j (due to size) 


rest 


71 77, 71 R p , -KRc 
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frequentist matching. Furthermore, with the exception of and 773, the 

nonboxed parameters have the indicated priors as one-at-a-time reference 
priors, so all three criteria point to the indicated recommendation. 

For p, we recommend using ttr p , since this prior is a one-at-a-time-reference 
for p, first-order matching (as shown in Table 5), and has excellent numer- 
ical coverage as shown in Figure 1. Note that some might prefer to use the 
right-Haar prior because of its exact matching for p (even though it exhibits 
a marginalization paradox) . For 02 > the one-at-a-time reference prior was 
also nRp. As this was first-order frequentist matching and among the best 
in terms of numerical coverage (see Figure 2), we also recommend it for this 
parameter. 

For Ai, the situation is unclear. The one-at-a-time reference prior is ttr\ 
and is hence our recommendation, but first-order matching results for this 
parameter are not known, and the numerical coverages of all priors were 
rather bad. For 0x2, the only first-order matching prior among our candi- 
dates is ttrcj. It also had the best numerical coverages, and so is a clear 
recommendation. Note, however, that we were not able to determine if it 
is a one-at-a-time reference prior for a 12, so the recommendation should be 
considered tentative. 

The most interesting question is what to recommend for general use, as 
an all-purpose prior. Looking at Table 2, it might seem that irjj or even 
7Tj would be good choices, since they are optimal for so many parameters. 



Case a. P(p> qo.os) 



Case b. P(p>q o5) 




-0.5 0.0 0.5 
Case a. P(p<q . SE ) 





-1.0 -0.5 0.0 0.5 
Caseb. P(p<q 0S5 ) 




Fig. 1. Frequentist coverages for p, where Case a: (/ii, fj.2, <7i, cr%) = (0, 0, 1, 1), and Case 
b: ()Ui,jU2i en, 02) —(0,0,2,1). The x-axis is for p £ (—1,1) . 
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a. P(<72/<T n < qo.95) 




-1.0 -0,5 0.0 0.5 

Case a, P(o 2 / Ct> q .o S ) 



1.00 
0.95- 
0.90 
0.85 



-1.0 



-0.5 0.0 



I 

0.5 

-■ K Hf 



1.0 



*Rt> 



Case b. P(a 3 /ci< qo.95) 



1.00 
0.95- 
0.90 
0.85 



1 1 — 

1.0 -0.5 



- 1 - 

0.0 



I r 

0.5 1.0 



Case b. P(a 2 /o! > q oos ) 



1.00- 
0.95- 
0.90- 
0.85 



■ > - ^ 



T 



^RO 



-1 .0 -0.5 



Krj_ 



0.0 0.5 

- *s 



1.C 



Fig. 2. Frequentist coverages for 87 = (T2/ where Case a: (/xi,/J2,fi, C2) = (0, 0, 1, 1) 
and Case b: (fJ,i,fJ.2, (Ti,a2) =(0,0,2,1). The x- axis is for p £ (—1,1) . 



However, both these priors can also give quite bad coverages, as indicated 
in Figure 2 for tth and in Figures 1 and 2 for ttj. Indeed, from Table 6, the 
only priors that did not have significantly poor performance for at least one 
parameter (other than Ai, for which no prior gave good coverages) were ttr p 
and ttru . The numerical coverages for ttr p and iTR a are virtually identical for 
all the parameters, so there is no principled way to choose between them. 
itrp is a commonly used prior and somewhat simpler, so it becomes our 
recommended choice for a general prior. 

5. Proofs. Due to space limitations, we give only the proofs of Theorems 
1, 2 and 8, because their proofs are quite different. The proofs of the other 
theorems in Section 4 are relatively easy consequences of Fact 1 and Lemmas 
1-3. For details of these other proofs, see [5]. 

5.1. Proof of Theorem 1 . With the constant prior for (fii ,1^2), the marginal 
likelihood of (o"i,o"2,/o) depends on S and is proportional to 

|£|-(n-l)/2 exp {_l trace(SS -l)}. 

Define 

G(X, a±, o~2, p) = \ 7t(cjJ,o"2, p* \ S) da\ da\dp* . 
Jv 
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Clearly, the frequentist coverage probability is 

P{9 < 0i_ a (X) | p, 1 ,p, 2 ,a 1 ,a 2 ,p} = P{G(S,a 1 ,a 2 ,p) < 1 - a | ai,a 2 ,p}. 
Under the prior (25), 



G(X,a 1 ,a 2 ,p) 



III, 



/t(p*)cxp(-0.5tracc(SS*- 1 )) 
V CT »(n-l+c 1 ) f7 *(n-l + c 2 ) (1 _ p » 2)(n _ 1)/2 



da* da*, dp* 



Iff- 



fe(p*)exp(-0.5tracc(SS*~ 1 )) 



CT ,(n-l + c l)(T ,(n-l + C2 ) {1 _ j0 , 2)(n _ 1)/ 



— daf da\ dp* 



where X* is the 2x2 symmetric matrix, whose diagonal elements are a* 2 
and cr| 2 , and off-diagonal element is a\a\p* . Denote 3 = diag(l/o"i, l/cr 2 ) 
and make transformations 



T = ESS 



ax 



S 



12 



a±a 2 
S22 



and n = BE*B=f..^_^ 



\U)\U) 2 p 



\ aia 2 a\ ) 
Clearly trace(SS*" 1 ) = trace(Tft _1 ), and then 

fe(p*)cxp(-0.5tracc(Tri' 1 )) 



G(X,a 1 ,a 2 ,p) 



>T> ^- i+c l lj2 1 - i + c 2(l_p*2)( n -l)/2 



dwi ciw 2 dp* 



///■ 



h.(p*)cxp(-0.5tracc(T»~ 1 )) 

(l_ p *2)(n-l)/2 



n — l + c^ n— 1 + C2 



dwi dw 2 dp* 



where 2? = { (u>i , lo 2 , P* ) ■ ^f 1 ^ 2 d(P*) < 5 f (/ )}- Since the sampling distribu- 
tion of T depends only on so does the sampling distribution of G(X, a\, a 2 , p)- 
Also D depends on p only. The result thus holds. 



5.2. Proof of Theorem 2. It follows from (18) and Lemma 3 (a) that 

P(p<P*l-a\tp) = P{U\ ' 



2* 
An— a 



Xn-6 r 

iT" Vi - r 2 

An— a 



1-a 



P 



Note that ■0, defined in (18), is invertible, and tp 1 {p) = p/yl — p 2 , for 
|p| < 1. It follows from Lemma 3 (a) and (b) that 



P(p<p* 1 _ a \£,p)=P 



-Z 



V X n*-b 



>0 



p* / l-a 
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^22(1 -H) 



02 V 1 - P 2Z 3 + (p&2/(7l)Jsu 



P 



Xl-1 



^2 v/T^vS 



Consequently, 
P(p<P*i- a \£,p) 



P 



Z, 



Xn-l 



Z| 



2* 
An— a 



1-a 



This completes the proof of part (a). For part (b), if (26) equals to 1 — a for 
any — 1 < p < 1, choose p = and get 

Z 3 / Z,* 



P 



< 



Xn-2 



1 — a, 



which implies that 6 = 2. Substituting 6 = 2 into (26) shows that a = 1. 



5.3. Proof Theorem 8. Part (a) is obvious. For part (b), since xi = Hi + 
Z\G\j^fn and Zi and Xn-i are independent, we have 



(0 5 < (0 5 *)!_ a ) 



+ #5 



y 2 * 

An— a 
Xn— 1 



Y 2 * 
An— a 



n V Xn-l 



>0 



1-0 



It follows from Lemma 3 (a) and (b) that 



zi 



2* 

An — 

z. 



+ 



n 



n 



Xl-i 



n 



Xn- 



1 



< 



A 2 -i 



2* 

An— a 

z\ 

An— a 



+ 



z. 



xl-i 



>0 



1-a 



2* / i_ a 
An— a 



Because Z\ and —Z\ have the same distribution and Z\ and Xn-i are inde- 
pendent, (32) holds. If (32) equals 1 — a for any #5, choose #5 = 0, 

1-a/ 



P 



xi-i 



2* 

A.n—a 



1 



a. 



which implies that a = 1. The result holds. 
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